Extracting Idiomatic Hungarian Verb Frames
نویسنده
چکیده
We describe a machine learning method for collecting idiomatic fixed stem verb frames. Firstly we collect frequent frame candidates from the output of a partial parser, secondly we apply a certain idiomaticity metric to the list to get the most idiomatic frames. The extracted frames will be translated to English and used as a resource in a Hungarian-to-English machine translation system.
منابع مشابه
A Unified Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian
We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. Th...
متن کاملExtracting Translations Verb Frames*
We describe a method for extracting translation verb frames (parallel subcategorization frames) from a parallel dependency treebank. The extracted frames constitute an important part of machine translation dictionary for a structural machine translation system. We evaluate our method independently, using a manually annotated test dataset, and conclude that the bottleneck of the method lies in q...
متن کاملA Uni ed Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian
We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. Th...
متن کاملUsing chunked corpora for the acquisition of collocations and idiomatic expressions
This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with ...
متن کاملA Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
We present an annotation study on a representative dataset of literal and idiomatic uses of infinitive-verb compounds in German newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus res...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006